200 research outputs found
The Lov\'asz Hinge: A Novel Convex Surrogate for Submodular Losses
Learning with non-modular losses is an important problem when sets of
predictions are made simultaneously. The main tools for constructing convex
surrogate loss functions for set prediction are margin rescaling and slack
rescaling. In this work, we show that these strategies lead to tight convex
surrogates iff the underlying loss function is increasing in the number of
incorrect predictions. However, gradient or cutting-plane computation for these
functions is NP-hard for non-supermodular loss functions. We propose instead a
novel surrogate loss function for submodular losses, the Lov\'asz hinge, which
leads to O(p log p) complexity with O(p) oracle accesses to the loss function
to compute a gradient or cutting-plane. We prove that the Lov\'asz hinge is
convex and yields an extension. As a result, we have developed the first
tractable convex surrogates in the literature for submodular losses. We
demonstrate the utility of this novel convex surrogate through several set
prediction tasks, including on the PASCAL VOC and Microsoft COCO datasets
B-tests: Low Variance Kernel Two-Sample Tests
A family of maximum mean discrepancy (MMD) kernel two-sample tests is
introduced. Members of the test family are called Block-tests or B-tests, since
the test statistic is an average over MMDs computed on subsets of the samples.
The choice of block size allows control over the tradeoff between test power
and computation time. In this respect, the -test family combines favorable
properties of previously proposed MMD two-sample tests: B-tests are more
powerful than a linear time test where blocks are just pairs of samples, yet
they are more computationally efficient than a quadratic time test where a
single large block incorporating all the samples is used to compute a
U-statistic. A further important advantage of the B-tests is their
asymptotically Normal null distribution: this is by contrast with the
U-statistic, which is degenerate under the null hypothesis, and for which
estimates of the null distribution are computationally demanding. Recent
results on kernel selection for hypothesis testing transfer seamlessly to the
B-tests, yielding a means to optimize test power via kernel choice.Comment: Neural Information Processing Systems (2013
A Note on k-support Norm Regularized Risk Minimization
The k-support norm has been recently introduced to perform correlated
sparsity regularization. Although Argyriou et al. only reported experiments
using squared loss, here we apply it to several other commonly used settings
resulting in novel machine learning algorithms with interesting and familiar
limit cases. Source code for the algorithms described here is available
Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity
Functional brain networks are well described and estimated from data with
Gaussian Graphical Models (GGMs), e.g. using sparse inverse covariance
estimators. Comparing functional connectivity of subjects in two populations
calls for comparing these estimated GGMs. Our goal is to identify differences
in GGMs known to have similar structure. We characterize the uncertainty of
differences with confidence intervals obtained using a parametric distribution
on parameters of a sparse estimator. Sparse penalties enable statistical
guarantees and interpretable models even in high-dimensional and low-sample
settings. Characterizing the distributions of sparse models is inherently
challenging as the penalties produce a biased estimator. Recent work invokes
the sparsity assumptions to effectively remove the bias from a sparse estimator
such as the lasso. These distributions can be used to give confidence intervals
on edges in GGMs, and by extension their differences. However, in the case of
comparing GGMs, these estimators do not make use of any assumed joint structure
among the GGMs. Inspired by priors from brain functional connectivity we derive
the distribution of parameter differences under a joint penalty when parameters
are known to be sparse in the difference. This leads us to introduce the
debiased multi-task fused lasso, whose distribution can be characterized in an
efficient manner. We then show how the debiased lasso and multi-task fused
lasso can be used to obtain confidence intervals on edge differences in GGMs.
We validate the techniques proposed on a set of synthetic examples as well as
neuro-imaging dataset created for the study of autism
Learning to Discover Sparse Graphical Models
We consider structure discovery of undirected graphical models from
observational data. Inferring likely structures from few examples is a complex
task often requiring the formulation of priors and sophisticated inference
procedures. Popular methods rely on estimating a penalized maximum likelihood
of the precision matrix. However, in these approaches structure recovery is an
indirect consequence of the data-fit term, the penalty can be difficult to
adapt for domain-specific knowledge, and the inference is computationally
demanding. By contrast, it may be easier to generate training samples of data
that arise from graphs with the desired structure properties. We propose here
to leverage this latter source of information as training data to learn a
function, parametrized by a neural network that maps empirical covariance
matrices to estimated graph structures. Learning this function brings two
benefits: it implicitly models the desired structure or sparsity properties to
form suitable priors, and it can be tailored to the specific problem of edge
structure discovery, rather than maximizing data likelihood. Applying this
framework, we find our learnable graph-discovery method trained on synthetic
data generalizes well: identifying relevant edges in both synthetic and real
data, completely unknown at training time. We find that on genetics, brain
imaging, and simulation data we obtain performance generally superior to
analytical methods
A low variance consistent test of relative dependency
We describe a novel non-parametric statistical hypothesis test of relative
dependence between a source variable and two candidate target variables. Such a
test enables us to determine whether one source variable is significantly more
dependent on a first target variable or a second. Dependence is measured via
the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of
empirical dependence measures (source-target 1, source-target 2). We test
whether the first dependence measure is significantly larger than the second.
Modeling the covariance between these HSIC statistics leads to a provably more
powerful test than the construction of independent HSIC statistics by
sub-sampling. The resulting test is consistent and unbiased, and (being based
on U-statistics) has favorable convergence properties. The test can be computed
in quadratic time, matching the computational complexity of standard empirical
HSIC estimators. The effectiveness of the test is demonstrated on several
real-world problems: we identify language groups from a multilingual corpus,
and we prove that tumor location is more dependent on gene expression than
chromosomal imbalances. Source code is available for download at
https://github.com/wbounliphone/reldep.Comment: International Conference on Machine Learning, Jul 2015, Lille, Franc
The Lov\'asz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
The Jaccard index, also referred to as the intersection-over-union score, is
commonly employed in the evaluation of image segmentation results given its
perceptual qualities, scale invariance - which lends appropriate relevance to
small objects, and appropriate counting of false negatives, in comparison to
per-pixel losses. We present a method for direct optimization of the mean
intersection-over-union loss in neural networks, in the context of semantic
image segmentation, based on the convex Lov\'asz extension of submodular
losses. The loss is shown to perform better with respect to the Jaccard index
measure than the traditionally used cross-entropy loss. We show quantitative
and qualitative differences between optimizing the Jaccard index per image
versus optimizing the Jaccard index taken over an entire dataset. We evaluate
the impact of our method in a semantic segmentation pipeline and show
substantially improved intersection-over-union segmentation scores on the
Pascal VOC and Cityscapes datasets using state-of-the-art deep learning
segmentation architectures.Comment: Accepted as a conference paper at CVPR 201
- …